February 16, 2021

Gene expression

I start the work with the data by finding the differentially expressed genes.

To do so, I perform the tests for comparison of the means of control and disease groups, starting with simple two sample for t-test. For each gene I also check the variances equality before comparing the groups’ means.

P-values from t-test: before and after correction

Gene expression - corrected test scheme

The number of differentiated genes proved to be really high, therefore I check whether the assumption on the normality of the distribution does not hinder the results by applying new test scheme:

  • Check the normality of distribution of both groups.
  • If it is normal, perform t-test, checking the equality of variances beforehand.
  • If it is not a normal distribution, perform Mann-Whitney test.
  • Correct obtained p-values with Benjamini & Hochberg method for multiple testing.

P-values after distribution consideration

Distribution effect

I compare the result of taking the distribution into consideration with the previous assumption.

Enrichment analysis

After getting gene differentiation, I proceed with enrichment analysis. I will start with ORA, then proceed into FCS methods.

ORA

##                                                 Title corrected_pvals
## 100                                     RNA transport    4.259285e-08
## 133                                        Cell cycle    4.259285e-08
## 305                               MicroRNAs in cancer    1.182302e-07
## 260                                 Alzheimer disease    1.960257e-06
## 265                                     Prion disease    1.960257e-06
## 295           Human T-cell leukemia virus 1 infection    1.960257e-06
## 300                                Pathways in cancer    2.529952e-06
## 88                                 Metabolic pathways    4.283633e-06
## 263                                Huntington disease    4.283633e-06
## 266 Pathways of neurodegeneration - multiple diseases    4.546480e-06
## 115                            Fanconi anemia pathway    6.612935e-06
## 170                                    Focal adhesion    6.612935e-06
## 304                           Proteoglycans in cancer    6.612935e-06
## 262                     Amyotrophic lateral sclerosis    7.685330e-06
## 101                         mRNA surveillance pathway    2.554206e-05

CERNO

##                                       Title corrected_pvals
## 88                       Metabolic pathways    2.368986e-08
## 133                              Cell cycle    2.368986e-08
## 300                      Pathways in cancer    2.368986e-08
## 295 Human T-cell leukemia virus 1 infection    4.941884e-08
## 170                          Focal adhesion    1.231707e-07
## 305                     MicroRNAs in cancer    5.131701e-07
## 159      Vascular smooth muscle contraction    5.800451e-07
## 116                  MAPK signaling pathway    2.139096e-06
## 119                  Rap1 signaling pathway    2.139096e-06
## 125             Chemokine signaling pathway    4.895844e-06
## 304                 Proteoglycans in cancer    4.895844e-06
## 148              PI3K-Akt signaling pathway    5.527751e-06
## 144                             Endocytosis    5.931361e-06
## 177     Complement and coagulation cascades    5.931361e-06
## 121              cGMP-PKG signaling pathway    7.034663e-06

Z-transform

##                                                 Title corrected_pvals
## 88                                 Metabolic pathways    9.001854e-28
## 300                                Pathways in cancer    1.231452e-22
## 266 Pathways of neurodegeneration - multiple diseases    9.693740e-16
## 295           Human T-cell leukemia virus 1 infection    1.234043e-15
## 148                        PI3K-Akt signaling pathway    2.047843e-14
## 116                            MAPK signaling pathway    6.369432e-14
## 133                                        Cell cycle    6.369432e-14
## 170                                    Focal adhesion    6.369432e-14
## 260                                 Alzheimer disease    2.115722e-13
## 305                               MicroRNAs in cancer    3.002434e-13
## 304                           Proteoglycans in cancer    4.702153e-13
## 265                                     Prion disease    3.590622e-12
## 263                                Huntington disease    4.907427e-12
## 119                            Rap1 signaling pathway    6.906820e-12
## 294                    Human papillomavirus infection    5.373930e-11

GSEA implementation

Signal to noise absolute - p-values

We can see the matlab output is definitely strange.

Signal to noise absolute - ES

## P-value:  4.997284e-42
## Correlation coefficient:  -0.6518573

Signal to noise - p-values

Signal to noise - ES

## P-value:  0.261582
## Correlation coefficient:  0.06141717

LFC absolute - p-values

LFC absolute - ES

## P-value:  2.1936e-26
## Correlation coefficient:  -0.5360192

LFC - p-values

LFC - ES

## P-value:  0.007659464
## Correlation coefficient:  0.1452507

PLAGE

##                                                    Title corrected_pvals
## 56                        Glycerophospholipid metabolism    9.275123e-53
## 171                             ECM-receptor interaction    9.405913e-53
## 238                                   Insulin resistance    1.147170e-52
## 233                            Relaxin signaling pathway    8.115912e-52
## 240 AGE-RAGE signaling pathway in diabetic complications    9.085146e-52
## 88                                    Metabolic pathways    5.763372e-51
## 148                           PI3K-Akt signaling pathway    6.818963e-51
## 27                       Arginine and proline metabolism    1.392011e-50
## 254                     Protein digestion and absorption    1.478016e-50
## 305                                  MicroRNAs in cancer    1.478016e-50
## 235  Parathyroid hormone synthesis, secretion and action    1.543308e-50
## 173                                    Adherens junction    2.323504e-50
## 273                            Vibrio cholerae infection    2.323504e-50
## 144                                          Endocytosis    3.092973e-50
## 166                             Apelin signaling pathway    3.250911e-50

GSVA

##                                        Title corrected_pvals
## 159       Vascular smooth muscle contraction    1.514201e-41
## 178                      Platelet activation    3.632513e-41
## 228               Oxytocin signaling pathway    4.745869e-38
## 231                          Renin secretion    2.135954e-37
## 166                 Apelin signaling pathway    2.934893e-37
## 131        Phospholipase D signaling pathway    4.966612e-35
## 121               cGMP-PKG signaling pathway    5.562491e-35
## 119                   Rap1 signaling pathway    1.047192e-34
## 109                   PPAR signaling pathway    1.069441e-33
## 230    Regulation of lipolysis in adipocytes    2.691752e-32
## 118                    Ras signaling pathway    1.665631e-31
## 185 C-type lectin receptor signaling pathway    1.665631e-31
## 116                   MAPK signaling pathway    2.880683e-31
## 211                     Long-term depression    1.213920e-30
## 204           Neurotrophin signaling pathway    2.380521e-30

log P-values correlation

Results comparison

##                    ORA CERNO   Z PLAGE LFC LFC_abs S2N S2N_abs GSVA
## enriched gene sets 114   200 263   338   0       0   1       1  267

Absolute signal to noise: MicroRNAs in cancer
Signal to noise: cGMP-PKG signaling pathway

Joint gene sets (besides GSEA)

## Number of joint enriched gene sets:  101

Combining p-values

##                                                    Title pval_combined
## 88                            cGMP-PKG signaling pathway  0.000000e+00
## 253                                  MicroRNAs in cancer  0.000000e+00
## 249                                   Pathways in cancer  1.584990e-55
## 121                   Vascular smooth muscle contraction  1.472594e-50
## 132                                       Focal adhesion  2.176197e-49
## 86                                Rap1 signaling pathway  3.639168e-48
## 83                                MAPK signaling pathway  7.100317e-48
## 252                              Proteoglycans in cancer  3.974759e-47
## 140                                  Platelet activation  1.954365e-46
## 99                                            Cell cycle  3.033170e-45
## 194 AGE-RAGE signaling pathway in diabetic complications  1.031549e-43
## 245              Human T-cell leukemia virus 1 infection  1.504357e-42
## 108                                          Endocytosis  1.713326e-42
## 136                                       Tight junction  2.279999e-42
## 85                                 Ras signaling pathway  1.532563e-41

Visualizations

cGMP-PKG signaling pathway

MicroRNAs in cancer

Pathways in cancer

Vascular smooth muscle contraction

Focal adhesion

Rap1 signaling pathway

MAPK signaling pathway

Proteoglycans in cancer

Platelet activation

Cell cycle